Skip to content

Add async support for dspy.Evaluate #8504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chenmoneygithub
Copy link
Collaborator

We are introducing:

  • _execute_with_multithreading for running evaluation in multithreading
  • _execute_with_event_loop for running evaluation in async concurrency.

The weird part is although technically async eval should run faster than multithreading because evaluation is IO bound task, but I am not noticing it consistently.

Testing script is pasted below:

import asyncio

import dspy
from dspy.datasets.gsm8k import GSM8K

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini", cache=False))


# Load math questions from the GSM8K dataset.
gsm8k = GSM8K()
gsm8k_trainset, gsm8k_devset = gsm8k.train[:50], gsm8k.dev[:100]


cot = dspy.ChainOfThought("question->answer")


def my_metric(args, pred):
    return 1.0 if pred.answer == args.answer else 0.0


evaluator = dspy.Evaluate(devset=gsm8k_devset, num_threads=50, display_table=False)


import time

start_time = time.time()
result = evaluator(cot, metric=my_metric)
end_time = time.time()
print(f"Time taken with multithreading: {end_time - start_time} seconds")
print(result)


async def main():
    return await evaluator.acall(cot, metric=my_metric)


start_time = time.time()
result = asyncio.run(main())
end_time = time.time()
print(f"Time taken with async: {end_time - start_time} seconds")
print(result)

70% time async runs faster than multithreading insignificantly, and 30% time it's the reverse. Two potential theories:

  • Bottleneck happens on provider side, like rate limiting.
  • Behind the scene litellm acompletion is not true async, I need to dig the code a bit.

queue.task_done()

workers = [asyncio.create_task(worker()) for _ in range(num_threads)]
await asyncio.gather(*workers)
Copy link
Collaborator

@TomeHirata TomeHirata Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: Does acyncio.gather use multiple threads with this setup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't use multiple threads, but does put those async workers to run asynchronously in the same event loop.

num_threads: int,
):
queue = asyncio.Queue()
results = [None for _ in range(len(devset))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
results = [None for _ in range(len(devset))]
results = [None] * len(devset)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@okhat
Copy link
Collaborator

okhat commented Jul 9, 2025

Wow so cool!

@okhat
Copy link
Collaborator

okhat commented Jul 9, 2025

Should this be added to dspy.Evaluate or deeper, like dspy.ParallelExecutor

@chenmoneygithub
Copy link
Collaborator Author

Should this be added to dspy.Evaluate or deeper, like dspy.ParallelExecutor

The _execute_with_event_loop is a simplified version of dspy.ParallelExecutor and just focuses on getting the dspy.Evaluate to work, so I would keep it within dspy.Evaluate.

I am not quite sure if we should have an async version of dspy.ParallelExecutor, because async doesn't have thread management, and it's pretty straightforward for the users to use asyncio.gather() to utilize the event loop. Then the only benefit will be the progress bar tracking 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants